Designing a Task-Based Evaluation Methodology for a Spoken Machine Translation System
نویسنده
چکیده
In this paper, I discuss issues pertinent to the design of a task-based evaluation methodology for a spoken machine translation (MT) system processing human to human communication rather than human to machine communication. I claim that system mediated human to human communication requires new evaluation criteria and metrics based on goal complexity and the speaker's prioritization of goals. 1 I n t r o d u c t i o n Task-based evaluations for spoken language systems focus on evaluating whether the speaker's task is achieved, rather than evaluating utterance translation accuracy or other aspects of system performance. Our MT project focuses on the travel reservation domain and facilitates on-line translation of speech between clients and travel agents arranging travel plans. Our prior evaluations (Gates et al., 1996) have focused on end-to-end translation accuracy at the utterance level (i.e., fraction of utterances translated perfectly, acceptably, and unacceptably). While this method of evaluation conveys translation accuracy, it does not give any information about how many of the client's travel arrangement goals have been conveyed, nor does it take into account the complexity of the speaker's goals and task, or the priority that they assign to their goals; for example, the same end-to-end score for two dialogues may hide the fact that in one dialogue the speakers were able to communicate their most important goals while in the other they were only able to communicate successfully the less important goals. One common approach to evaluating spoken language systems focusing on human-machine dialogue is to compare system responses to correct reference answers; however, as discussed by (Walker et al., 1997), the set of reference answers for any particular user query is tied to the system's dialogue strategy. Evaluation methods independent of dialogue strategy have focused on measuring the extent to which systems for interactive problem solving aid users via log-file evaluations (Polifroni et al., 1992), quantifying repair a t tempts via turn correction ratio, tracking user detection and correction of system errors (Hirschman and Pao, 1993), and considering transaction success (Shriberg et al., 1992). (Danieli and Gerbino, 1995) measure the dialogue module's ability to recover from partial failures of recognition or understanding (i.e., implicit recovery) and inappropriate utterance ratio; (Simpson and Fraser, 1993) discuss applying turn correction ratio, transaction success, and contextual appropriateness to dialogue evaluations, and (Hirschman et ah, 1990) discuss using task completion time as a black box evaluation metric. Current literature on task-based evaluation methodologies for spoken language systems primarily focuses on human-computer interactions rather than system-mediated human-human interactions. For a multilingual MT system, speakers communicate via the system, which translates their responses and generates the output in the target language via speech synthesis. Measuring solution quality (Sikorski and Allen, 1995), transaction success, or contextual appropriateness is meaningless, since we are not interested in measuring how efficient travel agents are in responding to clients' queries, but rather, how well the system conveys the speakers' goals. Likewise, task completion time will not capture task success for MT dialogues since it is dependent on dialogue strategies and speaker styles. Task-based evaluation methodologies for
منابع مشابه
The Role of Semantics in Spoken Dialogue Translation Systems
In this paper, we consider the role of semantics in the spoken dialogue translation systems. We begin by looking at some of the key properties of an existing spoken dialogue system, namely the sundial system which provides ight and train information over the telephone, and how these properties aaect the design methodology and functionality of spoken translation systems. These properties include...
متن کاملDevelopment of a QFD-based expert system for CNC turning centre selection
Computer numerical control (CNC) machine tools are automated devices capable of generating complicated and intricate product shapes in shorter time. Selection of the best CNC machine tool is a critical, complex and time-consuming task due to availability of a wide range of alternatives and conflicting nature of several evaluation criteria. Although, the past researchers had attempted to select ...
متن کاملR ’ s Machine Translation System for IWSLT 2009
In this paper, we describe the system and approach used by the Institute for Infocomm Research (IR) for the IWSLT 2009 spoken language translation evaluation campaign. Two kinds of machine translation systems are applied, namely, phrase-based machine translation system and syntax-based machine translation system. To test syntax-based machine translation system on spoken language translation, va...
متن کاملI2r's machine translation system for IWSLT 2009
In this paper, we describe the system and approach used by the Institute for Infocomm Research (IR) for the IWSLT 2009 spoken language translation evaluation campaign. Two kinds of machine translation systems are applied, namely, phrase-based machine translation system and syntax-based machine translation system. To test syntax-based machine translation system on spoken language translation, va...
متن کاملNUDT machine translation system for IWSLT2007
In this paper, we describe our machine translation system which was used for the Chinese-to-English task in the IWSLT2007 evaluation campaign. The system is a statistical machine translation (SMT) system, while containing an example-based decoder. In this way, it will help to solve the re-ordering problem and other problems for spoken language MT, such as lots of omissions, idioms etc. We repor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999